Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fill bad time intervals with fake data #782

Merged
merged 24 commits into from
Jan 11, 2024
Merged

Conversation

matteobachetti
Copy link
Member

@matteobachetti matteobachetti commented Dec 8, 2023

Depends on #754

Changes can be seen in action in StingraySoftware/notebooks#76

Also, resolve #612

@matteobachetti matteobachetti marked this pull request as draft December 8, 2023 16:32
@pep8speaks
Copy link

pep8speaks commented Dec 8, 2023

Hello @matteobachetti! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 2254:81: E203 whitespace before ':'
Line 2255:60: E203 whitespace before ':'
Line 2266:81: E203 whitespace before ':'
Line 2267:60: E203 whitespace before ':'

Comment last updated at 2024-01-11 09:46:05 UTC

Copy link

codecov bot commented Dec 8, 2023

Codecov Report

Attention: 1 lines in your changes are missing coverage. Please review.

Comparison is base (490a7c7) 96.31% compared to head (de2ee3e) 96.33%.
Report is 5 commits behind head on main.

Files Patch % Lines
stingray/lightcurve.py 83.33% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #782      +/-   ##
==========================================
+ Coverage   96.31%   96.33%   +0.02%     
==========================================
  Files          43       43              
  Lines        8497     8548      +51     
==========================================
+ Hits         8184     8235      +51     
  Misses        313      313              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@matteobachetti matteobachetti force-pushed the fill_btis_with_fake_data branch from 9357c6a to b14a586 Compare December 15, 2023 13:36
@matteobachetti matteobachetti marked this pull request as ready for review December 15, 2023 13:38
@matteobachetti matteobachetti force-pushed the fill_btis_with_fake_data branch 2 times, most recently from ac336a8 to 7ad8a61 Compare December 31, 2023 13:17
Copy link
Collaborator

@mgullik mgullik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @matteobachetti,
this is a great additional tool for the software.
A few things:
In a lightcurve-like object, the filling-the-gap routine is easier to understand. We know the time resolution of the lightcurve and the routine creates time bins with a value of count / countrate.

In a StingrayTimeseries-like object, it is more complicated, how do you choose the arrival times that are used to fill the gap? From the docstring I read
"Random data are extracted by randomly repeating the values of nearby good data"
This is not extremely clear. If you repeat the values of the arrival time, you don't fill the empty gaps, do you?

More importantly, we should find a way to test that using the function fill_bad_time_intervals on a StingrayTimeseries and then making a lightcurve is equivalent to using the function fill_bad_time_intervals on the lightcurve made from the original StingrayTimeseries (with the gaps). I guess the two cases can't lead to the same numbers, because of the random function, but they should be close enough.

In a lightcurve-like object

  • what happens if the gaps are smaller than the dt?
  • can I fill only certain gaps and not others? For example, I want to fill the gaps in the first half of the light curve and not in the second half
  • can we include the possibility of filling the gap with the linear interpolation between the two edges of GTIs adjacent to the BTI? Does it make sense?

Additional tests:

  • test multiple gaps and not only one gap to fill
  • some specific comments in the code

stingray/tests/test_base.py Show resolved Hide resolved
ev_like_filt.gti = np.asarray([[0, 498], [500, 900], [950, 1000]])
ev_new = ev_like_filt.fill_bad_time_intervals()

assert np.allclose(ev_new.gti, self.gti)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, what happens to the time array?
In principle, ev_new.time != ev_like_filt.time, because the BTI has been filled with random arrival times. right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep

stingray/tests/test_base.py Show resolved Hide resolved
stingray/tests/test_base.py Show resolved Hide resolved
@matteobachetti matteobachetti force-pushed the fill_btis_with_fake_data branch from 7bf79e7 to 11cf78e Compare January 3, 2024 15:53
@matteobachetti
Copy link
Member Author

@mgullik thanks for the thorough review, which I tried to cover in my new changes. There are a few remaining questions which I haven't answered yet, that probably need some discussion:

a lightcurve-like object, the filling-the-gap routine is easier to understand. We know the time resolution of the lightcurve and the routine creates time bins with a value of count / countrate.

In a StingrayTimeseries-like object, it is more complicated, how do you choose the arrival times that are used to fill the gap? From the docstring I read
"Random data are extracted by randomly repeating the values of nearby good data"
This is not extremely clear. If you repeat the values of the arrival time, you don't fill the empty gaps, do you?

I improved the docstring, explaining how uniformly and non-uniformly sampled data are treated differently. Basically, the only change is that times are assigned on a fixed grid for uniformly sampled, and randomized with the same countrate as in the buffer for non-uniformly sampled.

More importantly, we should find a way to test that using the function fill_bad_time_intervals on a StingrayTimeseries and then making a lightcurve is equivalent to using the function fill_bad_time_intervals on the lightcurve made from the original StingrayTimeseries (with the gaps). I guess the two cases can't lead to the same numbers, because of the random function, but they should be close enough.

Why do you think this is important? A uniformly sampled time series should behave just like a light curve, and tests in both time series and light curves all pass independently.

In a lightcurve-like object

  • what happens if the gaps are smaller than the dt?

The light curve machinery should cover this: bins partially outside GTIs are just treated as if they were outside GTIs.

  • can I fill only certain gaps and not others? For example, I want to fill the gaps in the first half of the light curve and not in the second half

I don't see a use case for this, what would be the application? I guess one could split the light curve, fill the GTIs in the first half, and then join the two chunks back together.

  • can we include the possibility of filling the gap with the linear interpolation between the two edges of GTIs adjacent to the BTI? Does it make sense?

We could. Again, what would be the use case? While using random data tries to preserve the statistical properties of the data set, the linear interpolation would knowingly alter that.

Additional tests:

  • test multiple gaps and not only one gap to fill

Done

  • some specific comments in the code

I think I addressed those one by one

Copy link
Member

@dhuppenkothen dhuppenkothen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good! some comments and thoughts on the limits of what I can do, that I think should probably be written down somewhere, but not necessarily here (maybe in a tutorial?)

stingray/base.py Outdated Show resolved Hide resolved
stingray/base.py Show resolved Hide resolved
stingray/base.py Outdated
----------------
max_length : float
Maximum length of a bad time interval to be filled. If None, the criterion is bad
time intervals shorter than 1/100th of the longest bad time interval.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does 1/100 come from? That seems maybe a bit arbitrary?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's actually 1% of the longest good time interval. It's just a small length, by default, so that we don't alter the statistical properties of the data too much

stingray/base.py Outdated Show resolved Hide resolved
stingray/base.py Outdated Show resolved Hide resolved
stingray/base.py Outdated Show resolved Hide resolved
stingray/base.py Outdated Show resolved Hide resolved
@matteobachetti matteobachetti force-pushed the fill_btis_with_fake_data branch from 88468da to 8248a04 Compare January 11, 2024 09:34
Copy link
Member

@dhuppenkothen dhuppenkothen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 👍 Assuming @mgullik is happy with it, too, this can get merged, I think?

@matteobachetti matteobachetti added this pull request to the merge queue Jan 11, 2024
Merged via the queue into main with commit 33647b4 Jan 11, 2024
16 checks passed
@matteobachetti matteobachetti deleted the fill_btis_with_fake_data branch January 29, 2024 20:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use matplotlib's object oriented interface in our crossspectrum.plot and other places
4 participants